A Visually Enhanced Neural Encoder for Synset Induction
نویسندگان
چکیده
The synset induction task is to automatically cluster semantically identical instances, which are often represented by texts and images. Previous works mainly consider textual parts, while ignoring the visual counterparts. However, how effectively employ information enhance semantic representation for challenging. In this paper, we propose a Visually Enhanced NeUral Encoder (i.e., VENUE) learn multimodal task. key insight lies in construct representations through intra-modal inter-modal interactions among images text. Specifically, first design interaction module attention mechanism capture correlation To obtain multi-granularity representations, fuse pre-trained tags word embeddings. Second, masking filter out weakly relevant information. Third, present gating adaptively regulate modalities’ contributions semantics. A triplet loss adopted train VENUE encoder learning discriminative representations. Then, perform clustering algorithms on obtained induce synsets. verify our approach, collect dataset, i.e., MMAI-Synset, conduct extensive experiments. experimental results demonstrate that method outperforms strong baselines three groups of evaluation metrics.
منابع مشابه
Multi-Modal Word Synset Induction
A word in natural language can be polysemous, having multiple meanings, as well as synonymous, meaning the same thing as other words. Word sense induction attempts to find the senses of polysemous words. Synonymy detection attempts to find when two words are interchangeable. We combine these tasks, first inducing word senses and then detecting similar senses to form word-sense synonym sets (syn...
متن کاملNeural time course of visually enhanced echo suppression.
Auditory spatial perception plays a critical role in day-to-day communication. For instance, listeners utilize acoustic spatial information to segregate individual talkers into distinct auditory "streams" to improve speech intelligibility. However, spatial localization is an exceedingly difficult task in everyday listening environments with numerous distracting echoes from nearby surfaces, such...
متن کاملTitle: Neural Time Course of Visually Enhanced Echo Suppression 1 Running Title: Visually Enhanced Echo Suppression 2 3
27 Auditory spatial perception plays a critical role in day-to-day communication. For instance, 28 listeners utilize acoustic spatial information to segregate individual talkers into distinct auditory 29 “streams” to improve speech intelligibility. However, spatial localization is an exceedingly 30 difficult task in everyday listening environments with numerous distracting echoes from nearby 31...
متن کاملMulti-channel Encoder for Neural Machine Translation
Attention-based Encoder-Decoder has the effective architecture for neural machine translation (NMT), which typically relies on recurrent neural networks (RNN) to build the blocks that will be lately called by attentive reader during the decoding process. This design of encoder yields relatively uniform composition on source sentence, despite the gating mechanism employed in encoding RNN. On the...
متن کاملA Convolutional Encoder Model for Neural Machine Translation
The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence. We present a faster and simpler architecture based on a succession of convolutional layers. This allows to encode the source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. On WMT’16 EnglishRomanian translation w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronics
سال: 2023
ISSN: ['2079-9292']
DOI: https://doi.org/10.3390/electronics12163521